Fair Learning in Markovian Environments

نویسندگان

  • Shahin Jabbari
  • Matthew Joseph
  • Michael Kearns
  • Jamie Morgenstern
  • Aaron Roth
چکیده

We initiate the study of fairness in reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. Our fairness constraint requires that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take time exponential in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of random telegraph noise on entanglement and nonlocality of a qubit-qutrit system

We study the evolution of entanglement and nonlocality of a non-interacting qubit-qutrit system under the effect of random telegraph noise (RTN) in independent and common environments in Markovian and non-Markovian regimes. We investigate the dynamics of qubit-qutrit system for different initial states. These systems could be existed in far astronomical objects. A monotone decay of the nonlocalit...

متن کامل

Reinforcement Learning in Markovian and Non-Markovian Environments

This work addresses three problems with reinforcement learning and adap-tive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. ...

متن کامل

Adding Memory to XCS

| We add internal memory to the XCS classiier system. We then test XCS with internal memory, named XCSM, in non-Markovian environments with two and four aliasing states. Experimental results show that XCSM can easily converge to optimal solutions in simple environments; moreover, XCSM's performance is very stable with respect to the size of the internal memory involved in learning. However, the...

متن کامل

Solving Problems in Partially Observable Environments with Classiier Systems (experiments on Adding Memory to Xcs) Solving Problems in Partially Observable Environments with Classiier Systems (experiments on Adding Memory to Xcs)

XCS is a classi er system recently introduced by Wilson that differs from Holland's framework in that classi er tness is based on the accuracy of the prediction instead of the prediction itself. According to the original proposal, XCS has no internal message list as traditional classi er systems does; hence XCS learns only reactive input/output mappings that are optimal in Markovian environment...

متن کامل

Non-Deterministic Policies In Markovian Processes

Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1611.03071  شماره 

صفحات  -

تاریخ انتشار 2016